Identifying dynamical systems with bifurcations from noisy partial observation 
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Dynamical systems are used to model a variety of phenomena in which the bifurcation structure is 
a fundamental characteristic. Here we propose a statistical machine-learning approach to derive low- 
dimensional models that automatically integrate information in noisy time-series data from partial 
observations. The method is tested using artificial data generated from two cell-cycle control system 
models that exhibit different bifurcations, and the learned systems are shown to robustly inherit the 
bifurcation structure. 
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Various phenomena ranging from climate change to 
chemical reactions have been modeled extensively by dy- 
namical systems [l|, Q, and the relevance of dynamical 
systems to modeling biological phenomena is being in- 
creasingly recognized 0, 0] ■ Recent advances in exper- 
imental techniques such as live-cell imaging that clari- 
fies molecular activities at high spatiotemporal resolu- 
tions 0-0] have accompanied this recognition. However, 
noise, partial observation, and a low controllability are 
still challenges for measuring biological systems in that 
both the system dynamics and measurement processes 
are highly stochastic, only a few components in a system 
are observable, and only a small number of experimental 
conditions can be examined. These difficulties have led 
to models being constructed from experimental observa- 
tions in a way that is often ad-hoc and semi-quantitative 
at best because instructive criteria and practical meth- 
ods have not yet been established for deriving the model 
equations by systematically integrating the information 
in the experimental data. 

To model complex systems such as cellular processes, a 
full description of all the systems details is often imprac- 
tical and not informative. Instead, a reduced descrip- 
tion that preserves the essential features of the system 
is more useful for comprehension., i.e., models described 
by low-dimensional dynamical systems are sufficient for 
explaining experimental observations. In particular, the 
bifurcation structure is a fundamental feature of dynam- 
ical systems since it characterizes the qualitative changes 
of the dynamics. Thus, identification of low-dimensional 
model systems that inherit the original bifurcation struc- 
ture is a crucial step in understanding the dynamics. 

Here we propose a statistical machine-learning ap- 
proach to automatically derive the low-dimensional 
model equations from single-cell time-series data ob- 
tained at a few conditions (i.e., bifurcation parameter 
values; Fig. [!}. Techniques for learning nonlinear dy- 
namical systems from time-series data have been em- 
ployed for chaos 8, 9], spatiotemporal patterns 
and multi-stable systems Only a few studies have 

applied the technique to biological data Hil ]. In a 



similar manner to some of those studies [l5| , we em- 
ploy a statistical technique to deal with the noisy and 
partial time-series data. However, rather than aiming to 
fit the model parameters to the observation, we obtain 
the low-dimensional model equations that inherit bifur- 
cation structure of the full system to capture the basic 
nature of the observed system. The performance of the 
method is demonstrated by using artificial data. 

We introduce a nonlinear state space model composed 
of state and observation equations that describe the sys- 
tem dynamics and observation process, respectively. We 
consider a system that is modeled by a ZJ-dimensional 
stochastic differential equations, and d components in 
the model can be simultaneously observed. The state 
equations are discretized in time by the Euler-Maruyama 
scheme [lj| . We write the time evolution of the zth vari- 
able at a time point t, x\{i = 1, . . . , D), as 



x' +1 = x\ 



Atf l ({x t 1 },s) + <j^tVAi, 



(1) 



where At is an integration time, cr, is the intensity of the 
system noise, and s is a bifurcation parameter. System 
noise Q is sampled from a standard normal distribution. 
To achieve efficient learning, the function /j is considered 




400 



time 



State equation Observation equation 

x =f (x,s) + noise y = g( x ) + noise 

FIG. 1. Schematic representation of the proposed method. 
Time-series data of the molecular activities in individual cells 
are obtained from measurements under a given input level 
s, which is regarded as a bifurcation parameter of the sys- 
tem. System and observation equations are then trained to 
reproduce the time-series data. 
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to be expressed by a summation of linearly independent 
functions as /^{x*}^) = J2n* Kfi (i x j}' s )i where iVj 
is the number of parameters {fc™} and functions {/"}. 
Since our aim is to reproduce the bifurcation structures 
of systems subjected to unknown equations, we adopt a 
polynomial base for the {/"}, rather than biochemically 
realistic functions like the Michaelis-Menten equation. 

The observation value of the i th component at a time 
point r, yl{i = 1, . . . , d), is written as 



II, 



i(x r i) + m4> r i 



(2) 



where rji is an observation noise intensity, and cf>l is 
sampled from a standard normal distribution. In gen- 
eral, a set of observed time points is a part of the 
entire set of time points in the numerical integration. 
Hereafter, 9 indicates the parameters to be estimated: 

The learning of dynamical systems is formulated as a 
maximum likelihood (ML) estimation, which is summa- 
rized below (further details arc given in the section 1 in 
the Supplemental Material). The likelihood is given by 
the conditional probability of the observed time series Y 
as a function of the model parameters 9. However, a 
straightforward maximization of the likelihood p(Y\0) is 
difficult because it requires the untractable summation 
of p(Y\X, 9)p(X\9) with respect to the time series of the 
state variables X. Thus, we employ the EM algorithm to 
maximize the log likelihood of a model by a two-step it- 
erative method that alternately estimates the states and 
parameters 17[. In the first step, the E step, the poste- 
rior distribution of the time series of a state (p(X\Y, 9)) 
is estimated based on the tentative parameter set 6* m. 
In the second step, the M step, the expectation value of 
logp(X, Y\9) is calculated as 

Q(Moid) = (logp(X,Y\6)) p(xlYi0old) , (3) 
and the parameter estimation is updated as 



argmaxQ(0,0 o i d ). 



(4) 



In this step, the optimization problem is reduced to lin- 
ear simultaneous equations and thus can be solved eas- 
ily. However, the problem in the E step is still analyt- 
ically unsolvable because the probability distribution of 
the time series is necessary. This calculation requires a 
state estimation at all time points including the points 
at which measurements are not conducted. We therefore 
obtain a numerical approximation of p(X\Y,9) using a 
particle filter algorithm that performs state estimations 
of nonlinear models using a Monte-Carlo method 
The particle filter (a numerical extension of Kalman fil- 
ter) approximates a general non-Gaussian state distribu- 
tion as a set of particles representing samples from the 
distribution and evaluates the log likelihood of the mod- 
els. Since the use of the particle filter introduces stochas- 
ticity into the learning algorithm, a slight modification of 



the M step is required to ensure convergence of the learn- 
ing [2(J. The optimization function in eq. (J4j) is replaced 
by<3/(#) = (l-a I )Q' I _ 1 (9)+a I Q{9,9 old ),where I is the 
iteration index, and {a/} is a sequence of non-increasing 
positive numbers converging to zero. 

To validate the method, we apply it to artificial data 
generated from models of a eukaryotic cell cycle control 
system since this system provides an illustrative example 
of cell ular dy namics composed of many molecular compo- 
The cell cycle is a fundamental biological 
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nents 

process characterized by repeated events underlying cell 
division and growth in which key proteins, Cyclin and 
Cyclin-dependent kinases, change their concentration pe- 
riodically and activate various cellular functions such as 
DNA synthesis. 

Two molecular circuit models of the cell-cycle control 
system in Xenopus embryos are adopted as the data gen- 
erators: that propos ed by Tyson and co-workers (the 
Tyson model) [2l|, l22|] , and that proposed by Ferrell and 
co-workers (the Ferrell model) [23J, l24j. Although both 
models show an oscillation onset as the synthesis rate of 
Cyclin increases, they differ in the type of bifurcation 
at the onset; the Tyson model exhibits a saddle- node 
bifurcation on an invariant circle (SNIC), while the Fer- 
rell model exhibits a supercritical Hopf bifurcation. We 
investigate whether the proposed learning procedure re- 
produces the correct bifurcation types of each model. 

Both data generators are composed of 9 variables in- 
cluding Cdc2, Cyclin, and other regulatory proteins. The 
time-series data is generated by a numerical calculation of 
these models as nonlinear Langevin equations at a few pa- 
rameter values (see the section 2 in the Supplemental Ma- 
terial for the model equations, and obtained time-series 
data). We simulate noisy observation by adding Gaus- 
sian noise to each observation value. Artificial data are 
prepared for three Cyclin synthesis rates across the bifur- 
cation point, and for each bifurcation parameter value, 
three independent time series are prepared in which the 
oscillation exhibits a large fluctuation in amplitude and 
period among the samples. 

Considering a polynomial of degree M, we write the 
system equations to be learned as 



f i ({x t J },s) = kl+kf: 



+ --- + fcfV D ) M (5) 



The observation equations are expressed simply as y\ = 
x\ + i]i4>i- We consider the active Cdc2 and Cyclin con- 
centrations to be observable variables since their levels 
have been observed in previous experiments [3] ■ Accord- 
ingly, yi (xi) and 2/2 (#2) represent the observed (true) 
concentrations of active Cdc2 and Cyclin, respectively. 
The other variables Xi (i > 2) represent the true con- 
centrations of unobservable components. In the system, 
the Cyclin synthesis rate, s, is a bifurcation parameter. 
We take the constant term in the equation for Cyclin to 
be the synthesis rate, i.e., k\ = s. Note that the ob- 
served orbit in the active Cdc2-Cyclin plane exhibits no 
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a) Tyson model 




(b) Ferrell model 
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FIG. 2. The AIC and BIC are plotted against different poly- 
nomial orders M = 1, 2, 3, 4 for the (a) Tyson and (b) Ferrell 
models. For each evaluation, 20 different initial parameters 
are sampled to avoid the local minima. After the learning, 
the average and standard deviation are calculated from 100 
evaluation trials based on the particle filter. 



intersection, (see Fig. SI in the Supplemental Material) 
suggesting that the two variables are sufficient to abstract 
the original high-dimensional dynamics. 

The simplest polynomial form required for reproducing 
the observed dynamics is determined by starting with lin- 
ear equations composed of active Cdc2 and Cyclin (sys- 
tem dimension D — 2 and polynomial order M = 1) and 
increasing the D and M by one. It turns out that D = 2 
is sufficient for reproducing a given time-series data set 
as shown below. The polynomial order M is determined 
by minimizing the information criteria through an opti- 
mization of the balance between the goodness of fit and 
2(1 127|. The Akaike information 



the model complexity 
criterion (AIC) and Bayesian information criterion (BIC) 
are evaluated from the log likelihood, parameter number, 
and data size for each model (FigJ5]). Both the AIC and 
BIC show a decrease from M = 1 to 3, but an increase or 
insignificant decrease at M = 4. Therefore, we analyze 
models with D = 2 and M — 3 (see the section 3 in the 
Supplemental Material for the learned parameter values 
and detailed settings in the learning algorithm). 

To check whether the learning procedure can capture 
the bifurcation of the original data generator system, 
we compare the bifurcation diagrams of the learned sys- 
tems with those of the data generators. Figures [3ja) 
and (b) show bifurcation diagrams against Cyclin syn- 
thesis rate s (red lines) for the learned systems in the 
Tyson and Ferrell models, respectively. The bifurcation 
diagrams for the corresponding noiseless data generators 
are shown by the gray lines. Although the data for the 
learning are given only at three bifurcation parameter 
points (indicated by the broken lines), the learned sys- 
tems have quantitatively similar diagrams to those of the 
corresponding data generators. The sudden appearance 
of a limit cycle with finite amplitude is reproduced for 
the Tyson model, while the gradual increase in ampli- 
tude from the bifurcation point is reproduced for the 
Ferrell model. These features are characteristics of the 



SNIC and supercritical Hopf bifurcation. Nullclines of 
the learned systems in the vicinity of the bifurcation 
points are shown in Figs. [3](c) and (d) for the Tyson 
model and in Figs. (He) and (f) for the Ferrell model. 
The results confirm the onset of SNIC and supercritical 
Hopf bifurcation, respectively. Thus, each learned sys- 
tem inherits the bifurcation type of the original model 
through the learning procedure in spite of noisy and par- 
tial observations. 

When the learning is conducted by using the data on 
two of the three bifurcation parameter points, the learned 
systems still exhibit the correct bifurcation types, al- 
though the points of oscillation onset and amplitudes are 
biased (Figs. EJg) and (h)). Note that identification of 
bifurcation is possible even by using the data only on 
one side of a bifurcation point (as indicated by the green 
lines). These results indicate the interesting possibility 
that the learning procedure can predict the type of bifur- 
cation that will occur from the data before the bifurcation 
point only. 

We also show here how the high-dimensional phase 
space structures of the original data generators are 
mapped onto the lower-dimensional surfaces in the 
learned systems. Reduced two-variable models are de- 
rived by adiabatic elimination following a similar proce- 
dure by Novak and Tyson 28j (see the section 4 in the 
Supplemental Material for the detailed procedure and re- 
duced model equations). Like the learned systems, the 
reduced models are composed of active Cdc2 and to- 
tal Cyclin. Figure 0] shows the nullclines of the learned 
systems (the solid orange and purple lines) and the re- 
duced models (the broken lines). In both the Tyson and 
Ferrell models, the learned system and reduced model 
nullclines for active Cdc2 have a similar TV-shaped form 
(orange lines), indicating the existence of positive feed- 
back in the molecular circuits. In contrast, those for 
the total Cyclin disagree quite significantly. To check 
the consistency of the nullclines and dynamics, Fig. 0] 
also shows a noisy time series from the data generators 
(blue points) and the orbit of the learned system (red 
lines). The nullclines of the learned systems are consis- 
tent with the dynamics in the data but the reduced mod- 
els are not. This failure arises because the dynamics of 
a component mediating the inhibition from active Cdc2 
to Cyclin is not fast enough to allow the adiabatic ap- 
proximation. Higher-order contribution beyond the adi- 
abatic elimination performed here should be included, 
which requires complicated technical work. Neverthe- 
less, the learning process automatically reproduces the 
appropriate low-dimensional dynamics and estimates the 
bifurcation structures without knowledge of the detailed 
high-dimensional model systems. 

Gathering biological data is complicated by intrin- 
sic and observation noises, partial observation, and a 
small number of possible experimental conditions. We 
have outlined here a machine- learning procedure based 
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FIG. 3. Bifurcation diagrams of the (a) Tyson and (b) Ferrell 
models. The active Cdc2 concentration x\ for the learned sys- 
tems is plotted against the Cyclin synthesis rate s (red), and 
corresponding concentrations of the data generators are also 
shown for comparison (gray). The broken lines indicate points 
at which the data are given. For the Tyson model, there is 
another attractor with a tiny basin that is ignored, (c-f) The 
nullclines of the learned systems around the bifurcation points 
are shown. Purple and orange lines represent nullclines of x\ 
(active Cdc2) and xi (Cyclin), respectively, and the gray ar- 
rows indicate the flow direction. (c,d) The learned system 
from the Tyson model exhibits SNIC and (e,f) that from the 
Ferrell model exhibits a supercritical Hopf bifurcation. The 
values of the bifurcation parameter are (c) s = 0.0038, (d) 
0.0044, (e) 0.0005, and (f) 0.0012, respectively. (g,h) Bifurca- 
tion diagrams using the data at two of the three bifurcation 
parameter points. The learning that lacks data at the low- 
est, intermediate, and highest bifurcation parameter values 
are denoted by as low~ , middle - and high~ , respectively. 



on likelihood maximization that makes use of all the in- 
formation in the time-series data, including that in the 
noise. By using synthetic data that share the difficulties 
found in actual biological data, we demonstrated that the 
procedure could derive low-dimensional model equations 
that reproduced the obtained time-series data and cap- 
tured the bifurcation types of the original systems. These 
results support the conjecture that the learning proce- 
dure will be able to construct reliable low-dimensional 
models for real time-series data of active Cdc2 and Cy- 
clin levels in future studies. Being able to identify the 
model systems and bifurcation types will provide a useful 
method for elucidating both the molecular interactions in 
the circuit and the biological functions of the dynamics. 
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FIG. 4. Comparison of the learned systems and reduced 
models for the (a) Tyson and (b) Ferrell models. The purple 
and orange lines represent the nullclines of x\ (active Cdc2) 
and X2 (Cyclin), respectively, for the learned systems (solid 
lines) and reduced models (broken lines). A noisy time se- 
ries from the data generators (blue dots) and the orbits of 
the learned models (red lines) are also shown. The blue ar- 
rows indicate the flow direction. The values of the bifurcation 
parameter are (a) s = 0.006, (b) 0.0015. 

Further, since the dynamics and bifurcation are found 
widely among various biological processes, the method 
is expected to be applicable to various cell system with 
cell-imaging data. 

We note that the proposed procedure can be in- 
terpreted as a reduction method from high- to low- 
dimensional systems like adiabatic approximation. In 
particular, in the vicinity of the bifurcation points, the 
systems are usually reduced to normal forms represented 
by low-dimensional differential equations with low-order 
polynomial forms [29(. However, unlike analytical reduc- 
tion methods that require the original high-dimensional 
equations, the present learning procedure uses only the 
time-series data. This is especially advantageous for 
studying cell dynamics that involve complex molecu- 
lar interactions. On the other hand, since the learning 
method has a less theoretical basis for interpreting the 
obtained equations, it should be complemented by an 
analytical procedure. 

In essence, the proposed method performs quantitative 
inference of the phase space structures of the dynamical 
systems. Therefore, not only the bifurcation structure 
but also other properties of the dynamical system can be 
analyzed using the same theoretical groundwork devel- 
oped here. The detection of phase sensitivity from noisy 
data in studies of biological clocks [30( would provide an 
interesting application. In addition, the method is flexi- 
ble enough to be combined with other machine- learning 
techniques; it was recently shown that compressive sens- 
ing exhibits a high performance for learning dynamical 
systems Q. Those possible extensions will further im- 
prove our method depending on the situations and exper- 
imental setup. In summary, the proposed method will be 
an efficient way to capture the essential features of the 
cellular dynamics by mediating dynamical system mod- 
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eling with experimental observations. 
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1 Learning algorithm 

1.1 State space model 

^sj . We first introduce the state space model composed of the state equations and observation 
equations describing the system dynamics and observation process, respectively. Let us consider 
D-dimensional stochastic differential equations that describe a system, and d components in 
the system that are observed simultaneously. In the model, the state variable Xi(i = 1,...,D) 



o 

(N 



d 



evolves under the function fi({xj},s), where s represents an input to the system, and the 
<^ 1 observation value yi is obtained through the function g, L {{xj}). 

■ By discretizing the dynamics in time with the Euler-Maruyama scheme [1] , we can write 

£S| \ the space state model as 

= x t i + Atf i ({x t j },s) + a i $jAi, (1) 

y\ = gidx^ + ^l (2) 

where i(e T) and r(e R) are time points, At is an integration time, and Oi and r\i are the noise 
intensities in the dynamics and observation, respectively. Both and <j£ are sampled from a 
standard normal distribution. In general, the set of observed time points R is a subset of the 
entire time point set T (i.e., R^T) for the numerical integration. We assume that the function 
fi can be expressed by the summation of linearly independent functions (/"(^ = 1, . . . , Ni)) as 



> 1 N i 

©: /i({4}»*) = E (3) 

Here, {k™} are the coefficients to be estimated. 

Let us consider that A time-series data sets are given. The learning procedure estimates 
the parameters {k™}, {<Ji}, {f]i}, and all the true states {x-} for each time-series set. In our 
method, the initial condition for the ith component in the ath time-series set is assumed to obey 
a Gaussian distribution parameterized by the mean and the variance Vi A . Distributions 
at other points are automatically estimated by the particle filter algorithm explained below. 
Then, the parameters to be estimated are = ({kf}, {cij}, {rji}, {/ii, a }> {Vi,a})- 

1.2 SAEM algorithm 

Our aim is to find model parameters 9 by maximizing log likelihood function 

log L{6) = \ogp(Y\e) = log / p(Y\X, e)p(X\6). (4) 

Here, Y (= {Y a },a = 1, . . . , A) denotes the data sets of the A time series, and X (= {X a },a = 
1, . . . , A) denotes the entire time series of estimated states. We employ an EM algorithm that 
maximizes log P(X,Y\9) (the complete-data log likelihood function), which is equivalent to 
maximizing the likelihood in eq. (|3])[2"]. By iterating two steps known as the E and M steps, 
the states X and parameters 6 are estimated alternately. Since our implement of the E step 
includes the Monte-Carlo method as described below, stochastic approximation EM (SAEM) 
algorithm is adopted [3]. The SAEM procedure is described as follows. 
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SAEM — 

1. Initialize the parameter vector 9 = 6q, and set the iteration number I to zero. 

2. (E step) Calculate the posterior distribution of the entire time series of state variable 
p{X\Y,9). 

3. (M step) Rename 9 as # id> and update the parameter vector as 

9new = argmaxQ/(6>) (5) 
6 

n = J Q(9,9 old ) (1 = 0), 

HI \ (l-c^Qi-i^ + a/Q^floid) (I>0), W 

where 

Q(9,9 old ) = (log P (X,Y\9)) p{xlYfiold) , (7) 
and {a/} is a non-increasing sequence of positive values converging to zero. 

4. Increment I by one, and iterate steps 2 and 3 until the estimation of the parameter vector 
converges. 



The details of the E and M steps are described in the following sections. 



1.3 E step 

Since different time-series data are independent stochastic variables, we can write 

\ogp(X\Y,9) = p(X a \Y a , 9). (8) 

a 

Then, each log p(X a \Y a , 9) is evaluated by using a particle filter algorithm that approximates 
the non-Gaussian distribution of the state x\ as a collection of many particles, each of which 
represents a sample from the distribution [H [5]. Specifically, the algorithm required here 
is called a particle smoother. For the ath time series, let xf a denote the pth particle for 
representing xj, and let y\ a denote an observed value at time t. The procedure of the particle 
smoother is described as follows. 



Particle smoother 



For a = 1,...,A 

1. For i = 1,...,D and p = 1,...,P, set the initial states as x^ ~ N(^i >a ,Vi ja ), and 
normalize the weights as ufa = 1/P 

2. At each t = 0, 1, . . . 

• For i = 1, . . . ,D and p = 1,...,P, calculate x^ 1 ^ from x\ v a by using the state 
equations. 

• If t = r S R, update the weights of the particles as 

- a ' olda ( 9 ) 

p rr,p i 



where 



^2p W a,o\J a 



17 = J[p{yla\W- p a }). (io) 
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If P e ff = 1/ J2 P ( w a) 2 < Pthres (i-e-j if effective number of the particles fall below 
a threshold value), resample the particles according to the new weights. Note that 
the history of particles (x°'£ , x\^, . . . , x r i ~ 1 ' p ) is resampled in parallel. 



Finish when all data points have passed (t = max(R)), and estimate the log likelihood as 

p 

P 



i og L a (0) = j>gdf;z7). (ii) 



reR 



Then, the posterior distribution of the time series of the state is approximated as 

p 

p(X a \Y a , old ) = < S ( X a ~ X5), (12) 
P =i 

where X% indicates a sample path ({x^ p a },i = 1, . . . ,D,t G T). On the basis of this approxi- 
mation, we calculate the average of the complete-data log likelihood as 

Q(Moid) = {\o gP {{x a },{Y a }\9)) p{{XaM{Ya}fiold) (13) 

A P 

= EE^^fK' 7 ^) ( 14 ) 

a=l p=l 

A p D ( 1 (x°' p -u- ) 2 \ 



a=l p=l i=l 
A P 



" ( 1, , , 2 ((C' P -4D-AtEnilW({4a}))^ 



+ EEEE<|-> 2 ^) 2 - 

a=l p=l teT i=l 



2 * K l > 2(ai) 2 At 



+ LLLL< --io g 27r(Tfc) -- 



a=l p=l rgR i=l 



1.4 M step 

At the ith iteration, the parameter-value update is performed by finding the 9 for which 
^Qi{9) = 0. We describe the case of -$)Q(0, 9 old) = for simplicity, although the optimization 
problem can be solved generally by the same method. The following example demonstrates 
the determination of parameters of the system dynamics and the strength of the system 
noise (cr^) in detail. 

First, by differentiating the complete-data log likelihood with respect to k™, we obtain 

9 ;Q(e,e old ) (16) 




a p teT 



off 1 * - jg - A ^n ; fcrgojgj A^nijg)' 

2(*i) 2 At 



Ni 

= E <( A 4a/r({4a») - (E k i E </r({4a»/r({4a}))' 

a,p,t n a,p,t 
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where Ax*'^ = (x^ 1 ' 13 — x*'^)/Ai. By defining the vectors bi and k[ and a matrix Ai as 

fak = E<( A 4a/r({4a») ( 17 ) 

(*!l)»n = *f ( 18 ) 

(A)« m = E</r({4;a})/r({4 ,p j)> (19) 

a,p,t 

the system dynamics parameters are determined as follows. 

h = (AiY l bi. (20) 
Next, using the new kf calculated above, we obtain 



and thus, 

(a,) 2 = At £ tog (a** - £ W({4,a})) ■ ( 22 ) 

a,p,t \ n / 

The other parameters are estimated in the same manner. Only for the variance of the initial 
condition V* we define the minimum value V m i n to avoid an unnaturally small value resulting 
from a problem called sample impoverishment [6]. 



2 Data preparation 

The model equations and parameter values used in the present study as the data generators are 
described here. Each data generators, the Tyson and Ferrell models, is numerically integrated 
with white Gaussian noise by using a stochastic Runge-Kutta II (SRKII) algorithm [7j. To 
prohibit negative values of the chemical concentrations as a result of noise, each variable is 
reset to a small positive value e (= 0.001) when the value is less than e. We confirmed that the 
results of the present study are stable so long as e is sufficiently small. 

The Tyson model described in Novak & Tyson [8]. 

ki [AA] - fc 2 [cyclin] - k 3 [Cdc2] [cyclin] + ft (t) (23) 

kpp [Cdc2-cyclin-tp] — (k wee + k ca k + fc 2 )[Cdc2-cyclin] (24) 
k c dc25 [Cdc2-cyclin-yp] + fc 3 [Cdc2] [cyclin] + £ 2 (t) 
k wee [Cdc2-cyclin] - {k cdc25 + k cak + /c 2 )[Cdc2-cyclin-yp] (25) 
kpp [Cdc2-cyclin-yptp] +&(*) 

k wee [Cdc2-cyclin-tp] - (k PP + k cdc25 + k 2 ) [Cdc2-cyclin-yptp] (26) 
k cak [Cdc2-cyclin-yp] + £ 4 (i) 



d [cyclin] 
dt 

d[Cdc2-cyclin] 



d[Cdc2-cyclin-yp] 
dt 



d[Cdc2-cyclin-yptp] 
dt 
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G?[Cdc2-cyclin-tp] 
dt 

d[Cdc25*] 



dt 



d[APC* 
dt~ 

d\WeeV 
dt 



ri[IEP* 
dt 



k c dc25 
kwee 

k 2 

[Cdc2] 



k C ak [Cdc2-cyclin] — (kpp + k wee + fc 2 )[Cdc2-cyclin-tp] 

fc c dc25[Cdc2-cyclin-yptp] + £ 5 (i) 

([Cdc25 tot ] - [Cdc25*]) 



k a [Cdc2-cyclin-tp] 
kb [PPase 
fc c [IEP* 



K a + ([Cdc25 tot ] - [Cdc25*]) 
[Cdc25*] 



X 6 + [Cdc25*] +€e(t) 
([APC tot ] - [APC*]) 



= k e [Cdc2-cyclin-tp] 



K c + ([APC 4o t] — [APC*]) 

([Wee!**] - [Weel*]) 



k c i [anti-IE] 



[APC* 



K d + [APC*] 



K K 



k f [PPase] 



[Weel* 



K f + [Weel' 



([Weel**] 
■ + &(*) 



[Weel*]) 



= fc g [Cdc2-cyclin-tp] 



([IE tof ] - PEP*]) 



[IEP* 



([IE tot ] 

Kn(t) 



[IEP*]) 



1 ^ + [IEP*] 
Kdc2 5 '([Cdc25 tot ] - [Cdc25*]) + y c(ic2 5»[Cdc25*] 
K, ee ,[Weel*] + K, ee "([Weel tot ] - [Weel*]) 

([APCtot] ~ [APC*]) + V 2 »[APC*] 
[Cdc2 tot ] - [Cdc2-cyclin] - [Cdc2-cyclin-yp] 
[Cdc2-cyclin-yptp] — [Cdc2-cyclin-tp] 



(27) 
(28) 

•£r(i) (29) 
(30) 

(31) 



(32) 
(33) 
(34) 
(35) 



Noise terms are represented by £i(t) with white Gaussian statistics = and (<£i(t)£j(r)) 



2of tfijtfCt - r) 



2.1 Ferrell model 

The Ferrell model is described in Tsai et al. [S]. 
d[cyclin 



dt 

d[Cdc2-cyclin] 
dt 



d[Cdc2-cyclin-yp] 
dt 



tf[Cdc2-cyclin-yptp] 
dt 



= k synth - fc dest [APC*][cyclin] - k a [Cdc2] [cyclin] + fc d [Cdc2-cyclin] + £i(f) (36) 

= fc a [Cdc2] [cyclin] - fc d [Cdc2-cyclin] - fc dest [APC*][Cdc2-cyclin] (37) 

- *wi[Weel*][Cdc2-cyclin] - k weelbasa i([Weel to t} - [Weel*])[Cdc2-cyclin] 

+ A: cdc 2 5 [Cdc25*][Cdc2-cyclin-yp] + fc C rf C 25fcasa/([Cdc25 tot ] - [Cdc25*])[Cdc2-cyclin-y] 

+ Ut) 

= fc u)ee i[Weel*][Cdc2-cyclin] + fc„, ee i 6QsaZ ([Weelt ot ] - [Weel*])[Cdc2-cyclin] (38) 

- A: C(ic 25[Cdc25*][Cdc2-cyclin-yp] - k C dc25basai([Cdc25 t ot} - [Cdc25*])[Cdc2-cyclin-yp] 

- k cak [Cdc2-cyclin-yp] + k pp2c [Cdc2-cyclin-yptp] - k des t [APC*] [Cdc2-cyclin-yp] 

+ Ut) 

= A: cafc [Cdc2-cyclin-yp] - fc pp 2c[Cdc2-cyclin-yptp] (39) 

- A;cd C 25[Cdc25*][Cdc2-cyclin-yptp] - fccdc25fca S az([Cdc25 tot ] - [Cdc25*])[Cdc2-cyclin-yptp] 

[Weel*][Cdc2-cyclin-tp] + 

^weelbasal ([Weeltot] - [Weel*])[Cdc2-cyclin-tp] 

- fc desi [APC*][Cdc2-cyclin-yptp] + a(i) 
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G?[Cdc2-cyclin-tp] 



k C dc25 [Cdc25*] [Cdc2-cyclin-yptp] (40) 

fcCdc25basal 

([Cdc25 tot ] - [Cdc25*])[Cdc2-cyclin-yptp] 
fc wee i[Weel*][Cdc2-cydm-tp] - k weelbasa i([Weel tot ] - [Weel*])[Cdc2-cyclin-tp] 
fc dest [APC*][Cdc2-cyclin-tp] + 



dt 



rf[Cdc25*] 
dt 



ri[Weel*] 
dt 



ri[APC*] 
dt 



d[Plx*] 
dt 



+ 



+ 




[Cdc2] = [Cdc2 to J - [Cdc2-cyclin] - [Cdc2-cyclin-yp] - [Cdc2-cyclin-yptp] - [Cdc2-cyclin-tp] (45) 

Noise terms are represented by with white Gaussian statistics = and {^i(t)^j(r)) = 

2af6 itj S(t-T). 

2.2 Artificial measurement process 

For both the Tyson and Ferrell models, the artificial measurement process is implemented as 
follows. 



Here, rjip is the observation noise intensity, and (f)i t 2 is sampled from a standard normal dis- 
tribution. 



2.3 Parameter values 

The parameters used in the Tyson and Ferrell models are listed in Tables £fl] and 321 respec- 
tively. In the Tyson model, all the parameter values are the same as those in the original 
literature [8] . In the Ferrell model, all the parameter values are the same as those in Pomeren- 
ing et al. [10] except for "factor" which is set as in [9]. 

2.4 Orbits of the data generators 

Figure 32 show noiseless orbits of the data generators. We note that each orbit exhibits no 
intersection, indicating that the two observable variables are sufficient to abstract the original 
high-dimensional dynamics. 

2.5 Data set 

The data used for the learning are shown in supplemental Fig. 321 



active Cdc2 



[Cdc2-cyclin-tp] + r\\§\ 



(46) 



[Cdc2tot] 



Total Cyclin 

([cyclin] + [Cdc2-cyclin] + [Cdc2-cyclin-yp] + [Cdc2-cyclin-yptp] 
+ [Cdc2-cyclin-tp] +r ? 20 2 )/[Cdc2 tot ]. 



(47) 
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3 Settings and results of learning 

Parameters in the learning algorithm is listed in Table 33 The learned parameters of the 
third-order polynomial model used in the main text is shown in Table £|H 



4 Model reduction 

We reduce the Tyson and Ferrell models to two-dimensional systems by the same procedure as 
described in [11]. By denoting the non-dimensionalized active Cdc2 and total Cyclin levels as 
u and v, respectively, the reduced models are written as follows. 
The reduced Tyson model takes the form as 

s v 

U = —— j (/APC(«) + /Wee(«))« + /Cdc25(>)(-— ~. j u), (48) 

1 + Kp P /k cak 1 + kp P /k cak 

v = s - fAPc(u)v, (49) 

where /cdc25> /wee, and /apc are the functions corresponding to the adiabatic solutions of eqs. 
d32D, d33D, and dS]), respectively. 

This reduction procedure includes the determination of the level of Cdc2-Cyclin-tp (i.e., 
the value of u) from the sum [Cdc2-Cyclin] + [Cdc2-Cyclin-tp] (see Appendix A in [11]). This 
is based on a detailed balance assumption of the phosphorylation reaction between the two 
molecular species. However, in the Ferrell model, the absence of the reaction makes the original 
reduction procedure inapplicable. Then, we simply assume [Cdc2-Cyclin-tp] ~ [Cdc2-Cyclin] + 
[Cdc2-Cyclin-tp], because it is observed that the ratio of [Cdc2-Cyclin] to the summation is 
small throughout the dynamics within the parameter region we consider. Consequently, the 
reduced Ferrell model is written as 

u = s - (f A pc(u)) + fwcc{u))u + f Cdc25 {v - u), (50) 
v = s - fAPc(u)v, (51) 

where /cdc25> /wee, and /apc are derived from the adiabatic approximation in the same manner 
as the Tyson model. 
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Parameter 



Parameter 



/ci[AA]/[Cdc2 tot ] s k syn s 

A; 3 [Cdc2 tot ] 1.0 k dest 0.006 

k C AK 0.25 k a 0.1 

k PP 0.025 k d 0.001 

V 2 <[APC tot ] 0.015 factor 10 

^2»[APC tot ] 1.0 k weel 0.05 

V cdc 25> [Cdc25 tot ] 0.1 k weelbasa i k weel / factor 

V cdc2 5" [Cdc25 tot ] 2.0 k C dc25 0.1 

K,ee' [Weel tot ] 0.1 k C dc25basai k cdc25 / f actor 

V wee » [Weel tot ] 1.0 k cak 0.8 

fc a [Cdc2 to t]/[Cdc25 tot ] 1.0 k pp2c 0.008 

fc 6 [PPase]/[Cdc25 tot ] 0.125 k Cdc25on 1.75 

A: c [IE tot ]/[APC tot ] 0.1 k cdc 25off 0.2 

fc d [anti IE]/[APC tot ] 0.095 k weelon 0.2 

fc e [Cdc2 tot ]/[Weel tot ] 1.33 Awio// 1.75 

fc/[PPase]/[Weel tot ] 0.1 k plxon 1.0 

fc g [Cdc2 tot ]/[IE tot ] 0.65 fc pteo// 0.15 

fchpPascj/flEtot] 0.087 A: apco „ 1.0 

if a /[Cdc25 tot ] 0.1 k apcoff (115 

AV[Cdc25 tot ] 0.1 ' EC50 Cd c25 40 

K c /[APC tot ] 0.01 EC50„, ee i 40 

K d /[APC tot ] 0.01 EC50 pte 40 

K e j [Weel tot ] 0.3 EC50 apc 40 

K f /[Weel tot ] 0.3 n cdc25 4 

if s /[IEtot] 0.01 n weel 4 

^/[IEtot] 0.01 

<Ti,.... 5 /[Cdc2 tot ] 0.0024 n apc 3 

CT 6 /[Cdc25t t] 0.0024 Cdc25 tot ~" 15 

a 7 /[APC tot ] 0.0024 Weel to t 15 

(J 8 /[Wee tot } 0.0024 Plx to t 50 

a 9 /[IEP to t] 0.0024 APC tot 50 

77i/[Cdc2 tot ] 0.005 Cdc2 tot 230 

?? 2 /[Cdc2t t] 0.005 <7i,..., 9 0.06 

771 0.4 

7?2 OA 



Table SI: Parameters in the Tyson model Table S2: Parameters in the Ferrell model 



Parameters in the learning algorithm 



Number of time series A 
Integration time At 
Entire-time points T 
Observed-time points R 
Decreasing sequence in SAEM aj 
Particle number P 

Particle number threshold for resampling Pthres 

Initial system dynamics parameters {fc™} 

Initial system noise strength <7j 

Initial observation noise strength r\i 

Minimum value of initial condition variance V m in 
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1.0 

{0,1, 2,..., 400} 

{0,2, 4,..., 400} 

1 {I < 30), l/V(7-30) (otherwise) 

1000 

500 

uniform distribution [-0.001,0.001] 

0.1 

0.2 

0.001 



Table S3: Pre-determined parameters used in the learning algorithm. 
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Parameter 


Learned (Tyson model) 


Learned (Ferrell model) 


k[ 


0.00064 


-0.00032 




-0.334 


-0.343 


kf 


0.0655 


0.0657 


i -1 

A:f 


-0.8 


1.05 


fcf 


2.39 


-1.25 




-0.639 


0.107 




-25.3 


-47.3 


fcf 


31.1 


47.8 


i Q 

*a 


-13.1 


-3.97 




2.22 


-0.157 


"2 


-0.4 


-0.57 




0.0612 


0.0637 


i 4 
A'o 


-4.62 


-0.9 


I iTj 
"2 


5.47 


2.41 


l-f> 
"2 


-U.oUO 


n nnnei c 
U.UUUolo 


k 2 


-9 


61.4 


k' 2 


22.7 


-61.3 


k 2 


-15.3 


18.3 


k 2 


1.92 


-2.73 


v\ 


0.00646 


0.00199 


c 2 


0.00703 


0.00205 




0.00334 


0.00101 


?72 


0.00621 


0.00133 



Table S4: Learned parameters of the third-order polynomial systems for the Tyson and Ferrell 
models. 



O 
o 




0.1 0.2 0.3 
Active Cdc2 

(a) 



0.4 



0.25 
0.2 

I 0-15 
o 

1 0.1 
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0.05 





s=0.0015 

s=0.001 

s=0.0005 
i 



0.05 
Active Cdc2 



0.1 



(b) 



Figure SI: Time series generated from the (a) Tyson and (b) Ferrell models through the 
artificial measurement process in the case that both system and observation noises are zero. 
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Figure S2: Noisy time series generated from the (a-f) Tyson and (g-1) Ferrell models through 
the artificial measurement process. The colors indicate results from independent trials. The 
values of the bifurcation parameter are (a,d) s = 0.002, (b,e) 0.005, (c,f) 0.008, (g,j) 0.005, 
(h,k) 0.001, and (i,l) 0.0015, respectively. 
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